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o 

Abstract 

o 

The availability of Infrastructure-as-a-Service (IaaS) computing clouds gives researchers access to a 
large set of new resources for running complex scientific applications. However, exploiting cloud resources 
for large numbers of jobs requires significant effort and expertise. In order to make it simple and 
transparent for researchers to deploy their applications, we have developed a virtual machine resource 
manager (Cloud Scheduler) for distributed compute clouds. Cloud Scheduler boots and manages the 
user-customized virtual machines in response to a user's job submission. We describe the motivation 
__ and design of the Cloud Scheduler and present results on its use on both science and commercial clouds. 
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1 Introduction 



Infrastructure as a Service (IaaS) cloud computing is emerging as a new and efficient way to provide 

computing to the research community. Clouds are considered to be a solution to some of the problems 

^. encountered with early adaptations of grid computing where the site retains control over the resources and 

the user must adapt their application to the local operating system, software and policies. This often leads 

—J to difficulties especially when a single resource provider must meet the demands of multiple projects or 
when projects cannot conform to the configuration of the resource provider. IaaS clouds offer a solution 

j^| to these challenges by delivering computing resources using virtualization technologies. Users lease the 
resources from the provider and install their application software within a virtual environment. This frees 
the providers from having to adapt their systems to specific application requirements and removes the 
software constraints on the user applications. In most cases, it is easy for a user or a small project to 
create their virtual machine (VM) images and run them on IaaS clouds. However, the complexity rapidly 
increases for projects with large user communities and significant computing requirements. In this paper we 
describe a system that simplifies the use of IaaS clouds for High Throughput Computing (HTC) workloads. 
The growing interest in clouds can be attributed in part to the ease in encapsulating complex research 
applications in Virtual Machines ( VMs) , often with little or no performance degradation pQ . Studies have 
shown, for example, that particle physics application code run equally well in a VM or on the native system 
[2]. Today, open source virtualization software such as Xen [3] and KVM [1] are incorporated into many 
Linux operating system distributions, resulting in the use of VMs for a wide variety of applications. Often, 
special purpose servers, particularly those requiring high availability or redundancy, are built inside a VM 
making them independent of the underlying hardware and allowing them to be easily moved or replicated. 
It is also not uncommon to find an old operating system lacking the drivers for new hardware, a problem 
which may be resolved by running the old software in a virtual environment. 

The deployment and management of many VMs in an IaaS cloud is labour intensive. This can be 
simplified by attaching the VMs to a job scheduler and utilizing the VMs in a batch environment. The 



Nimbus Project |5j has developed the one-click cluster solution. This provides a batch system on multiple 
clouds using one type of VM [6] . 

We further simplify the management of VMs in an IaaS cloud and provide new functionality with the 
introduction of Cloud Scheduler. Cloud Scheduler provides a means of managing user-customized VMs on 
any number of science and commercial cloudq^J 

In the following sections we present the architecture of our system and highlight the role of Cloud 
Scheduler to manage VMs on IaaS clouds in a batch HTC environment. We present early results on the 
operation, and highlight the successes and issues of the system. We summarize the paper with a discussion 
of the future developments. 

2 System architecture 

The architecture of the HTC environment running user-customized VMs on distributed IaaS clouds is 
shown in fig. [TJ A user creates their VM and stores it in the VM image repository. They write a job 
script that includes information about their VM and submits it to the Condor Job Scheduler. The Cloud 
Scheduler reads the queues of the Condor Job Scheduler, requests that one of the available cloud resources 
boot the user VM, the VM advertises itself to the Condor Job Scheduler which then dispatches the user 
job to that VM. 

In this section we describe the following components: VM image repository, the cloud resources and 
the Condor Job Scheduler. The Cloud Scheduler is discussed in more detail in the following section. 

VM image repository 

The user builds their VM by first retrieving a base image from a VM image repository. The base 
images are simple Linux VM images or images that include project or application based code. Once 
the user has modified their image, they will store it back in the VM image repository. The repository 
may reside at a single site or be distributed, however, it must be accessible to the cloud resources. 

Cloud resources 

The system currently supports Amazon EC2 and IaaS clouds using Nimbus [5]. Support for IaaS 
clouds using OpenNebula J7j and Eucalyptus [8j is under development. 

Job Scheduler 

The job scheduler used in the system is the Condor HTC job scheduler [9j. Condor was chosen 
because it was designed to utilize heterogeneous idle workstations which makes it ideal to use as a 
job scheduler for a dynamic VM environment. Condor has a central manager which matches user 
jobs to resources based on job and resource attributes. The Condor startd daemon must be installed 
and started when a VM image is booted. The VM then advertises its existence to the Condor central 
manage^ Users submit jobs by issuing the condor.submit command. The user must add a number 
of additional parameters specifying the location and properties of the VM. The description of the 
parameters is found in Appendix I. 

1 Commercial cloud providers include Amazon, RackSpace and IBM, to name a few. Science clouds use hardware resources 
funded by governments for research purposes and are located in universities or national laboratories. 

2 We use Condor Connection Brokering (CCB) to allow VM worker nodes that use Network Address Translation (NAT) to 
connect to the Condor Central Manager. 
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Figure 1: An overview of the architecture used for the system. A user prepares their VM image and a job 
script. The job script is submitted to the Condor Job Scheduler. The Cloud Scheduler reads the job queue 
and makes a request to boot the user VM on one of the available clouds. Once there are no more user jobs 
requiring that VM type, the Cloud Scheduler makes a request to the respective cloud to shutdown the user 
VM. 



3 Cloud Scheduler 

The Cloud Scheduler is an object oriented python-based package designed to manage VMs for jobs based 
on the available cloud resources and job requirements. Users submit jobs to the Condor Job Scheduler 
after they have been authenticated using X.509 Proxy Certificates [TO]. The certificates are also used to 
authenticate starting, shutting down, or polling VMs with Nimbus clusters. Authentication with EC2 is 
done by using a standard shared access key and secret key. 

In the following subsections, we describe the Cloud Scheduler's object classes and how they are used to 
manage the VMs for the jobs. Finally we discuss the job scheduling and load balancing considerations. 

3.1 Resource and job management classes 

Cloud Scheduler keeps track of jobs and resources with a set of cloud and job management classes (see 
Tables IT] and uh . The cloud management classes include the ResourcePool, Cluster and VM classes. The 
ResourcePool is a list of cloud resources that is read on initialization, but can be updated at run-time. The 
Cluster class contains static information describing the properties of each cloud and a dynamic list of VM 
objects running on that cloud. The VM class contains information describing the properties and state of 
a VM. 

The job management classes include the JobPool and Job classes. The JobPool class contains a list of 
job objects that are derived from the jobs submitted to Condor. The Job class contains the properties of 
the user job. 

3.2 VM management 

When Cloud Scheduler is started it reads the general and cloud configuration files. It starts the following 
threads that are run on a periodic basis. 

1. The JobPoller thread maintains the state and metadata of the jobs that are queued and running on 
the Condor Job Scheduler. It effectively maps the Condor Job Scheduler queue into the JobPool. 

2. The Scheduler thread starts and stops VMs based on the information in the JobPool, satisfying the 
resource demands of the workload. The design goal of Cloud Scheduler is to leave prioritization 
and scheduling decisions in the domain of the Condor Job Scheduler. However, the order in which 
the Scheduler thread provides resources can impact the scheduling algorithms of the Condor Job 
Scheduler. Job scheduling and load balancing are discussed in more detail in the following section. 

The Scheduler thread also monitors the VMs and, if necessary, updates the state of VMs in the 
JobPool. It will shut down VMs that are in an error state; if there are jobs that still require this VM, 
then the Scheduler will start a new instance of the VM to replace the one it has shut down. 

3. The CleanUp thread stops VMs that are no longer required. It can correct the state of the job in the 
JobPool. If a VM is shut down due to an error, then the CleanUp thread changes the state of the job 
in the JobPool from "scheduled" to "new" so that a new VM can be created for that job. 

When Cloud Scheduler is shut down, it can either shut down all the VMs that is has started, or it can 
persist its state. In the latter case, the VMs continue to run the jobs. Cloud Scheduler reloads the state 
when it is restarted and resumes managing the jobs and resources. 



3.3 Job scheduling and load balancing 

The Condor Job Scheduler is designed to manage job prioritization and scheduling [9|. As mentioned, 
Cloud Scheduler can impact Condor's job scheduling. For example, consider two queued jobs, the first 
submitted job requires a VM type of VM-A, and a second submitted job requires a VM type of VM-B. If 
Cloud Scheduler starts a VM-B first (as the resources to boot a VM-B are available), then jobs requiring 
VM-B will run before jobs requiring VM-A. 

Cloud Scheduler can be configured to take into account user fairness, resource utilization, and user 
priorities. Currently, Cloud Scheduler will start as many VMs that will fill the resources. The VMs are 
evenly distributed among all users with jobs in the queue. As other users submit jobs, Cloud Scheduler 
re-balances the VM allocations by shutting down over-allocated VMs and starting under-allocated VMs. 
For example, a single user will get the full allocation of Cloud Scheduler's VMs, but once a second user 
submits jobs, half of the first user's VMs will be shut down to free resources for the second user. 

Cloud Scheduler can re-balance the VM distribution by shutting down VMs gracefully or by killing them 
outright. When configured for graceful shutdown, Cloud Scheduler switches the Condor-state of pending 
jobs requiring the over-allocated VMs to a "held" state thus preventing them from being dispatched to 
currently running VMs. The next VM of the over-allocated type to finish its job can be safely shutdown 
without affecting running jobs. When Cloud Scheduler is configured to kill VMs outright, the VMs are 
shutdown immediately without waiting for the job to finish. If a VM is killed while a job is running, then 
the Condor Job Scheduler re-queues the job for execution. 

4 Results 

The system is currently being used for particle physics and astronomy applications. At the moment we 
operate two independent systems for each research community. In addition, we are commissioning a cluster 
at the University of Victoria that will be shared by a number of groups. 

4.1 Particle physics 

The system is being used to generate simulated particle physics events for the BaBar experiment based 
at the Stanford Linear Accelerator Center (SLAC). A number of faculty at the University of Victoria are 
members of the collaboration and one of the responsibilities of the group is to use Canadian computing 
resources for the generation of the simulated data. Up to now we have used standalone facilities and also 
a grid of facilities |1 1(. [T2"] . 

The simulation application contains C++ and FORTRAN code. The current size of the VM is ap- 
proximately 16 GBytes and requires about 1 GB of RAM. The simulation requires access to calibration 
databases that vary in size up to 2 TB. The application accesses the databases on a modest but regular 
basis. The databases can be remote if the network connectivity is good. In Canada, we are able to use 
the CANARIE network which connects research and educational institutions with a multi-gigabit network 
|13j . Our link to Amazon EC2 is through the commodity network and for the time being, we have a copy 
of the databases on Amazon Storage to overcome the network limitation. Typically the simulation requires 
6 hours and produces an output file of 100GB. All output files are copied back to the University of Victoria 
where they are merged and sent to SLAC. 

The simulation production is operated by a single expert user. There are no other users of the cloud 
system. The user prepares the job scripts and submits them to the Condor Job Scheduler. The Cloud 
Scheduler manages 80 VMs provided by three clouds located at the University of Victoria, the National 
Research Council (NRC) of Canada in Ottawa and the Amazon EC2 cloud in the eastern US. We limit 



Table 1: The cloud management classes 



ResourcePool class 


Object 


Description 


ClusterList 


The list of Cluster objects 


Cluster class 




Object 


Description 


name 


The name of cluster 


host 


The hostname of cluster 


cloucLtype 


The type of IaaS software (Nimbus, Eucalyptus, etc) 


memory 


The RAM available for a VM 


cpu.archs 


The CPU architectures available 


networks 


The network types available (private or public or both) 


vm_slots 


The maximum number of VMs allowed on the cluster 


cpu_cores 


The maximum number of cpus allowed for a single VM 


storage 


The scratch space available 


vms 


The VM List 


VM class 




Object 


Description 


name 


The name assigned to the VM 


id 


The cluster-specific identifier 


vmtype 


The type of the VM 


vmstate 


The state of the VM (Starting, Running or Error) 


hostname 


The hostname of the VM 


clusteraddr 


The address of the IaaS head node that controls this VM 


network 


The type of networking of the VM (private or public) 


cpuarch 


The cpu architecture of the VM (i386 or x86_64) 


image 


The VM image used to boot the VM 


memory 


The VM RAM 


cpucores 


The number of cores in the VM 


storage 


The size of the scratch space used by the VM 


errorcount 


The number of times the VM has given an error response 


lastpoll 


The date/time of the latest update 


last_state_change 


The date/time of the last VM state change 



Table 2: The job management classes 



JobPool class 




Object 


Description 


NewList 


List of new Job objects 


ScheduledList 


List of scheduled Job objects 


Job class 




Object 


Description 


GlobalJobID 


The Condor job ID 


User 


The user that submitted the job 


Priority 


The priority given in the job submission file (default = 1) 


VMType 


The VMType 


VMNetwork 


The network required (private/public) 


VMCPUArch 


The CPU architecture (x86 or x86_64) 


VMName 


The name of the image the job is to run on 


VMLoc 


The location (URL) of the image 


VMAMI 


The Amazon AMI of the image 


VMMem 


The amount of memory in MB 


VMCPUCores 


The number of cpu cores 


VMStorage 


The amount of storage space 



the number of EC2 VMs due to the cost, however, we are exploring the use of the EC2 on a variable basis 
dependent on the price. 

The system is performing very well completing more than 2000 seven-hour jobs in approximately one 
week. When jobs are submitted to the queue, the Cloud Scheduler successfully boots as many VMs as are 
required. Once booted, the VMs successfully identify themselves to the job scheduler and begin to run 
jobs in the queue. As expected, the three cloud sites become a single distributed cloud, and simulation 
production proceeds identically as on a traditional cluster. 

Issues that arose during the run were mainly centered around database access. To run efficiently, the 
jobs require fairly fast, low latency access to the databases. When the databases are hosted at the UVic 
site or the NRC site, the database access is over the CANARIE network described above, and the jobs 
are limited by the speed of the CPU rather than the I/O. However, when the jobs run on Amazon EC2, 
the network bandwidth between EC2 and the other two cloud sites is not sufficient for the jobs to run 
efficiently, and the jobs take approximately double the time to run to completion. This was solved by 
hosting a copy of the databases on Amazon Simple Storage System (S3) and accessing that copy from jobs 
running on EC2. 

Another issue involved the quality of the data being produced on EC2. The BaBar collaboration places 
strict checks on simulation production; any remote site that produces data for the collaboration must pass 
a series of tests that compare data produced at the remote site to a set of reference data produced at 
SLAC. When running on standard EC2 instances with older AMD CPUs, the data that was produced was 
different enough from the reference data to be unacceptable to the collaboration. Running the jobs on 
Amazon's "high CPU" instances with newer Intel chips solved this problem. 



4.2 Astronomy 

The system is also a central component of CANFAR p3], an astronomy project led by researchers at the 
University of Victoria and the National Research Council of Canada Herzberg Institute of Astrophysics. 
The goal is to provide researchers the ability to create custom environments for analyzing data from survey 
projects. Currently the system is available to early adopters, using two clusters: one at the WestGrid 
facility located at the University of Victoria (25 machines, 200 cores) and one at the Herzberg Institute of 
Astrophysics (6 machines, 32 cores). The system has been tested successfully with over 9,000 jobs utilizing 
more than 33,000 core hours. Work is currently underway to streamline the process of creating and sharing 
virtual machines with researchers, to allow them to easily make VM images that suit their needs and ready 
for deployment on cloud resources. 

5 Summary 

We have presented a new method for running large scale complex research applications on IaaS computing 
clouds. The development of Cloud Scheduler simplifies the management of virtual machine resources in a 
distributed cloud environment [15] by hiding the complexity of VM management. We have demonstrated 
that the system works for both astronomy and particle physics applications using multiple cloud resources. 
We have shown that the system is robust and fault tolerant. We described our plans to further develop 
Cloud Scheduler and address other issues for running HTC workloads in a cloud environment. 

The support of CANARIE, the Natural Sciences and Engineering Research Council, the National 
Research Council of Canada and Amazon are acknowledged. 



Appendix I: Job submission customization 

The user job submission script follows the Condor format, however, a number of custom attributes are 
required to configure the VM. The following table lists the additional items. 



Attribute 


Description 


VMType 


Unique name of required VM 


VMLoc 


URL (Nimbus) of the image 


VMAMI 


AMI (EC2-like) of the image 


VMCPUArch (x86) 


CPU architecture (x86 or x86_64) 


VMCPUCores (1) 


Number of CPU cores 


VMStorage 


Required storage space 


VMMem 


RAM required 


VMNetwork 


Network required (public/private) 



The VMType is a custom attribute of the VM advertised to the Condor central manager. It is specified 
on the VM condor_config or condor_config. local file. 
A sample job script is listed below. 



Regular Condor Attributes 
Universe = vanilla 

Executable = script. sh 

Arguments = one two three 

Log = script.log 

Output = script. out 

Error = script. error 

should_transfer_f iles = YES 
when_to_transf er_output = 0N_EXIT 
# 

# Cloud Scheduler Attributes 
Requirements = 



+ VMType 

+VMLoc 

+VMAMI 

+VMCPUArch 

+VMCPUCores 

+VMNetwork 

+VMMem 

+VMStorage 

Queue 



= "vm-name" 

= "http://repository.tld/your.vm. img.gz' 

= "ami-df asfds" 

= "x86" 

_ II 4 II 

= "private" 
= "512" 
= "20" 
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